135 research outputs found

    Transformer Uncertainty Estimation with Hierarchical Stochastic Attention

    Get PDF
    Transformers are state-of-the-art in a wide range of NLP tasks and have also been applied to many real-world products. Understanding the reliability and certainty of transformer model predictions is crucial for building trustable machine learning applications, e.g., medical diagnosis. Although many recent transformer extensions have been proposed, the study of the uncertainty estimation of transformer models is under-explored. In this work, we propose a novel way to enable transformers to have the capability of uncertainty estimation and, meanwhile, retain the original predictive performance. This is achieved by learning a hierarchical stochastic self-attention that attends to values and a set of learnable centroids, respectively. Then new attention heads are formed with a mixture of sampled centroids using the Gumbel-Softmax trick. We theoretically show that the self-attention approximation by sampling from a Gumbel distribution is upper bounded. We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets. The experimental results demonstrate that our approach: (1) achieves the best predictive performance and uncertainty trade-off among compared methods; (2) exhibits very competitive (in most cases, improved) predictive performance on ID datasets; (3) is on par with Monte Carlo dropout and ensemble methods in uncertainty estimation on OOD datasets.Comment: AAAI 202

    Relation between the germination and infection ratio on Sida hermaphrodita L. Rusby seeds under hot water treatment

    Get PDF
    Sida hermephrodita or virginia mallow is a perspective perennial herb in the Malvaceae family able to yield a biomass crop through between ten and twenty years. Additionally, the plants have a lot of uses and benefits for instance it can use it as a fodder crop, honey crop, ornamental plant in public gardens. It has favorable features like fast growing and resistance against the disease and climatic fluctuations, etc. Sida is in base stage of domestication therefore has a serious disadvantage the low and slow germination as a big part of wild plants. Due to the expressly low germination percent the need of seed showing of driller is should tenfold, 200 thousand seed/acre instead of 10-20 thousand what is not available and expensive Therefore practical purposes of our research of seed physiology was to increase the seed germination percent in a disposable ,basically wild Sida population. We examined two factors relating to seed germination percent and seed germination power during our research: the influence of hot water treatment and the effect of exogenus or endogenus infection of seed. However, in our germination tests, utilizing scarified seeds with hot water (65, 80 and 95 oC), 29,33 to 46% germinated of the seeds collected from the population of S. hermaphrodita in Debrecen. The average germination for all season was 5-10 % wihitout treatment and rised using hot water up to almost 50%. When physically scarified used, the oldest seeds showed the best germination (46 %) after the hot water operation in spite of the previus studys (Spooner 1985; Chudik et al. 2010; Doliński R. 2009.). We discovered that there are a distinguished close relationship between the seeds collecting time and the infection, as well as germination percentage. Thus, 2009 season was the most favourable in case of contamination (control:17,33 and 80 oC treatment:0%) as well as germination percent. It could be concluded that, the best season for our findings was 2009 due to autumn harvest of Sida seeds. In our oppinion, the autumn harvesting should be the best time to overcome the problem of the low germination and high infection percentage

    Relation between the germination and infection ratio on Sida hermaphrodita L. Rusby seeds under hot water treatment

    Get PDF
    Sida hermephrodita or virginia mallow is a perspective perennial herb in the Malvaceae family able to yield a biomass cropthrough between ten and twenty years. Additionally, the plants have a lot of uses and benefits for instance it can use it as a fodder crop,honey crop, ornamental plant in public gardens. It has favorable features like fast growing and resistance against the disease and climaticfluctuations, etc. Sida is in base stage of domestication therefore has a serious disadvantage the low and slow germination as a big part of wildplants. Due to the expressly low germination percent the need of seed showing of driller is should tenfold, 200 thousand seed/acre instead of10-20 thousand what is not available and expensive Therefore practical purposes of our research of seed physiology was to increase the seedgermination percent in a disposable ,basically wild Sida population.We examined two factors relating to seed germination percent and seed germination power during our research: the influence of hot watertreatment and the effect of exogenus or endogenus infection of seed. However, in our germination tests, utilizing scarified seeds with hotwater (65, 80 and 95 oC), 29,33 to 46% germinated of the seeds collected from the population of S. hermaphrodita in Debrecen. The averagegermination for all season was 5-10 % wihitout treatment and rised using hot water up to almost 50%. When physically scarified used, theoldest seeds showed the best germination (46 %) after the hot water operation in spite of the previus studys (Spooner 1985; Chudik et al. 2010;Doliński R. 2009.). We discovered that there are a distinguished close relationship between the seeds collecting time and the infection, as wellas germination percentage. Thus, 2009 season was the most favourable in case of contamination (control:17,33 and 80 oC treatment:0%) aswell as germination percent. It could be concluded that, the best season for our findings was 2009 due to autumn harvest of Sida seeds. Inour oppinion, the autumn harvesting should be the best time to overcome the problem of the low germination and high infection percentage

    Recognizing speculative language in biomedical research articles: a linguistically motivated perspective

    Get PDF
    We explore a linguistically motivated approach to the problem of recognizing speculative language (“hedging”) in biomedical research articles. We describe a method, which draws on prior linguistic work as well as existing lexical resources and extends them by introducing syntactic patterns and a simple weighting scheme to estimate the speculation level of the sentences. We show that speculative language can be recognized successfully with such an approach, discuss some shortcomings of the method and point out future research possibilities.

    Automatic de-identification of textual documents in the electronic health record: a review of recent research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here.</p> <p>Methods</p> <p>This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers.</p> <p>Results</p> <p>The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries.</p> <p>Conclusions</p> <p>In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

    Circulating syndecan-1 is associated with chemotherapy-resistance in castration-resistant prostate cancer

    Get PDF
    OBJECTIVES: Docetaxel chemotherapy is a standard treatment for castration-resistant prostate cancer (CRPC). Rapidly expanding treatment options for CRPC provide reasonable alternatives for those who are resistant to docetaxel. Therefore, prediction of docetaxel resistance has become of great clinical importance. Syndecan-1 (SDC1) has been currently shown to be involved in chemotherapy resistance in various malignancies including prostate cancer. The predicting value of serum SDC1 level has not been evaluated yet. PATIENTS AND METHODS: We assessed the baseline levels of SDC1 in serum samples of 75 patients with CRPC who received docetaxel therapy until the appearance of therapy resistance. In one patient who was treated with three treatment series, we assessed also 6 additional serum samples collected during a 1-year treatment period. Serum SDC1 levels were correlated with clinical outcomes as well as with serum levels of MMP7. RESULTS: Pretreatment SDC1 serum levels were not associated with patients' age, the presence of bone or visceral metastases. In univariable analyses, patients' performance status, the presence of bone or visceral metastases, high pretreatment prostate specific antigen and SDC1 levels were significantly associated with cancer-specific survival. In multivariable analysis patients' performance status (P = 0.005), presence of bone or visceral metastases (P = 0.013) and high SDC1 level (P = 0.045) remained independent predictors of patients' survival. In the patient with available follow-up samples serum SDC1 level increased from 50 to 300ng/ml at radiographic progression. Serum concentrations of SDC1 were correlated with those of MMP7 (r = 0.420, P = 0.006). CONCLUSIONS: Our present results together with currently published data suggest a role for SDC1 shedding in chemotherapy resistance. Determination of serum SDC1 may contribute to the prediction of docetaxel resistance and therefore may help to facilitate clinical decision-making regarding the type and timing of therapy for patients with CRPC

    De-identification of primary care electronic medical records free-text data in Ontario, Canada

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Electronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR data</p> <p>Methods</p> <p>We used <it>deid </it>open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.</p> <p>Results</p> <p>We found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.</p> <p>Conclusion</p> <p>The <it>deid </it>program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.</p
    • …
    corecore